Search CORE

10 research outputs found

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

Author: Aren Jansen
Douglas W. Oard
Jerome White
Jiaul Paik
Rashmi Sankepally
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Research on ranked retrieval of spoken con-tent has assumed the existence of some auto-mated (word or phonetic) transcription. Re-cently, however, methods have been demon-strated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked re-trieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assess-ment was performed by other native speak-ers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, cou-pled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

CiteSeerX

Crossref

Neural Language Model Based Attentive Term Dependence Model for Verbose Query (Student Abstract)

Author: Mitra Pabitra
Paik Jiaul H.
Podder Dipannita
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 06/09/2023
Field of study

The query-document term matching plays an important role in information retrieval. However, the retrieval performance degrades when the documents get matched with the extraneous terms of the query which frequently arises in verbose queries. To address this problem, we generate the dense vector of the entire query and individual query terms using the pre-trained BERT (Bidirectional Encoder Representations from Transformers) model and subsequently analyze their relation to focus on the central terms. We then propose a context-aware attentive extension of unsupervised Markov Random Field-based sequential term dependence model that explicitly pays more attention to those contextually central terms. The proposed model utilizes the strengths of the pre-trained large language model for estimating the attention weight of terms and rank the documents in a single pass without any supervision

Association for the Advancement of Artificial Intelligence: AAAI Publications

Software-performance evaluation

Author: Douglas W. Oard
Jiaul H. Paik
Jyothi K. Vinjumur
Publication venue
Publication date
Field of study

In some jurisdictions, parties to a lawsuit can request documents from each other, but documents subject to a claim of privilege may be withheld. The TREC 2010 Legal Track developed what is presently the only public test collection for evaluating privilege classification. This paper examines the reliability and reusability of that collection. For reliability, the key question is the extent to which privilege judgments correctly reflect the opinion of the senior litigator whose judgment is authoritative. For reusability, the key question is the degree to which systems whose results contributed to creation of the test collection can be fairly compared with other systems that use those privilege judgments in the future. These correspond to measurement error and sampling error, respectively. The results indicate that measurement error is the larger problem

CiteSeerX

Overview of the FIRE 2011 RISOT Task

Author: David Doermann
Douglas W. Oard
Jiaul Paik
Prasenjit Majumder
Tamaltaru Pal
Utpal Garain
Publication venue
Publication date: 01/01/2013
Field of study

RISOT was a pilot task in FIRE 2011 which focused on the retrieval of automatically recognized text from machine printed sources. The collection used for search was a subset of the FIRE 2008 and 2010 Bengali test collections that contained 92 topics and 62,825 documents. Two teams participated, submitting a total of 11 monolingual runs. 1

CiteSeerX

Crossref

GRAS

Author: Jiaul H. Paik
Kalervo Järvelin
Mandar Mitra
Oard D. W.
Swapan K. Parui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

The FIRE 2013 question answering for the spoken web task. Forum for Information Retrieval Evaluation

Author: Aren Jansen
Douglas W Oard
Hltcoe
Jerome White
Jiaul Paik
Johns Hopkins
Rashmi Sankepally
Publication venue
Publication date: 01/01/2013
Field of study

ABSTRACT The FIRE 2013 Question Answering for the Spoken Web (QASW) task was an information retrieval evaluation in which the goal was to match spoken Gujarati questions to spoken Gujarati answers. This paper describes the design of the task, the development of the test collection, the runs that were submitted, and the corresponding results

CiteSeerX

Incremental blind feedback

Author: Billerbeck Bodo
Dipasree Pal
Doszkocs Tamas Endre
Jaleel Nasreen Abdul
Jiaul H. Paik
Swapan K. Parui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Fast Corpus-Based Stemmer

Author: Jiaul H. Paik
Majumder P.
Oard D. W.
Peters C.
Ramanathan A.
Swapan K. Parui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Effective and Robust Query-Based Stemming

Author: Dipasree Pal
Jiaul H. Paik
Oard D. W.
Snyder B.
Stephen E. Robertson
Swapan K. Parui
Voorhees E. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref